FIND.DON[UP,DOC]1 - www.SailDart.org

perm filename FIND.DON[UP,DOC]1 blob sn#431702 filedate 1979-04-11 generic text, type T, neo UTF8
FIND is a system command that causes a search for a specified key in a
specified file.  The FIND command has the following syntax:

	FIND[ WITHIN <delim>] <key>[ OMIT[TING] <omits>][ IN <file>]

where [] indicates optional elements.  There are also DFIND and OFIND
commands with the same syntax (except for the command name).  <key> is
the string of characters to be found in the file <file>, <omits> are
characters to be ignored in the file, and <delim> is either a single
character or one of the following words:  MSG LINE PAGE PARAGRAPH GRAF.

Now for some details.


FILE TO BE SEARCHED

The default file to be searched is the lab phone directory.  If you use
the DFIND command, the default is the unabridged dictionary word list.
You can specify your own default via the OFIND command (see below).

If you specify a file other than the default, certain special names are
recognised:  PHONE gets you the phone directory (without having to type
its full name), DICT gets you the dictionary, and ∂ (partial sign) gets
you your mail file.  You can optionally follow the ∂ with (1) a
programmer name to specify a mail file other than your own, or an
asterisk (*) to specify the system message mail file NOTICE.TXT, and/or
(2) an extension to specify one other than .MSG, and/or (3) a PPN to
specify one other than [2,2].


WHAT GETS PRINTED

When a match is found, the "unit" of the file that includes the LAST
character of the key is printed, where the "unit" is determined by the
<delim>.  The default unit is the PARAGRAPH except for the DFIND
command, for which the default is the LINE.  If the partial-sign (mail)
filename specifier is used, the default unit is MSG.  You can override
these defaults using OFIND or the WITHIN clause (see syntax above).  The
delimiter <delim> can be specified as a single character or it can be
specified by one of the following exact names:

	MSG  LINE  PAGE  PARAGRAPH  GRAF

where MSG means that partial sign (∂) is the delimiter (designed for use
with mail files), LINE means that the end of a line is the delimiter
(i.e., only the line on which the key ends will be typed), PAGE means
that formfeed (a pagemark) is the delimiter, and PARAGRAPH or GRAF means
that a blank line is the delimiter.

The delimiter character will be treated as a delimiter only if it occurs
as the first character on a line; a line starting with the delimiter
character is considered the first line in a new text unit, and the
previous line is the last line in the previous text unit.  Text units
may span page boundaries; the pagemarks are not printed.

Within a single delimited text unit, up to about 25 lines can occur both
before and after the line in which the key is found.  If more than 25
lines occur before and/or after the key but within the delimited text
area, an ellipsis (. . .) will be typed out before the first line typed
out and/or after the last line typed out.  Further occurrences of the
key within the same text unit will not be detected unless the 25-line
limit is exceeded and the key occurs entirely after the last line typed.
Normally (i.e., if the text unit ends within 25 lines after the key)
the search picks up starting with the CRLF (carriage return line feed)
ending the last line printed (thus the CRLF can be used as part of the
key to search for occurrences only at the beginnings of lines).

Each separate text unit containing the key is a "HIT" and is typed out,
with each line of text preceded by an asterisk (*) except that the line
in which the key occurs is preceded by a greater-than sign (>).  Also,
the hits are counted and the count is printed after the whole file has
been searched.  (Multiple hits within a single text unit may occur; see
preceding paragraph.)  In the printout the hits are separated by blank
lines.  EXCEPTION: If the <delim> is LINE, then no blank lines are
inserted and the "*" and ">" prefixes are omitted.


WHAT GETS SEARCHED FOR

Here's where things get interesting.  Within the <key>, certain
characters have special interpretations, as listed below:

     comma	Separates two strings to be searched for simultaneously;
		that is, FIND FOO,BAR,BAZ will search for FOO, BAR, and
		BAZ.  Simultaneous searches like this take no more (or
		less) time than searching for a single string.
    letter	Matches either upper- or lower-case in the file.
     'xxx	Character with ascii code xxx (octal); e.g., '044=$.
		FIND q'015 will look for lines ending with "q" (or "Q").
     {xyz}	Any of the characters xyz; any number of characters may
		be given between the braces, and they may include any of
		the constructs listed here except comma, infinity, or
		another `embraced' string.  For instance, FIND ≡M{s¬∃≡,}
		searches for an upper-case M followed by either a
		lower-case s or an upper-case S or a CR, LF, tab, space,
		formfeed, or comma (see below regarding "≡", "¬", "∃").
       ∀	Any character.
       ∃	Any character except CR, LF, tab, space, or formfeed.
      ¬x	Any character except x (x can be a multi-character
		construct such as {xyz} or ∃).
      ≡x	The character x (used to quote these special chars;
		can also be used to quote a letter to enforce either
		upper- or lower-case).
      ∞x	Any number (including zero) of repetitions of x (x can
		be a multi-character construct; see examples below).
     space	Equivalent to ∞¬∃, i.e., zero or more spaces, tabs,
		CRLFs, and formfeeds; to match precisely one space,
		quote the space with `≡'.

Note: The time taken by the search is independent of the complexity of
the key.  (This does not consider the time taken to print the hits.)
For example, searching for any key in the dictionary (about 2.9 million
characters of text) takes about 6.5 seconds of Ebox time.


THE "OMIT" CLAUSE

The OMIT (or OMITTING) clause in the syntax above lets you specify
certain characters to be ignored during the search.  The default is
'012'000, i.e., ignore linefeeds and nuls.  (Thus carriage returns may
be used as single-character delimiters around lines.)  The <omits>
string takes precedence over the <key>; i.e., FIND XYZ OMITTING Y is
guaranteed to find zero hits.

The <omits> string may include any of the special constructs listed
above for the <key>, except for the "∞" and "space" constructs.


THE "OFIND" COMMAND

If you use OFIND instead of FIND in the syntax above, it is exactly like
FIND except that the OPTION.TXT file on your login area is scanned for
a line beginning "FIND:" (case of letters is ignored) and, if found,
the line is used to override various defaults.  The format for the
line is:

	FIND:[ WITHIN <delim>][ OMIT[TING] <omits>][ IN <file>][;]

Any fields not specified in OPTION.TXT retain their usual defaults as
defined in the preceding sections.  As a special case for compatibility
with an earlier version of FIND, the line may read:

	FIND:<file>

to specify a default file without affecting the <delim> and <omits>.


EXAMPLES

	FIND [LES:

will print out the entry for LES in the phone directory.  (The "[" and
":" keep it from finding arbitrary words that happen to contain the
string "les", since in the directory the programmer name field is
surrounded by those characters.)

	FIND garply baz in ∂

will search your own mail file for "garply baz" and type out the
entire message(s) it occurs in.

	FIND ≡	ME in ∂*
    or	FIND '011ME in ∂*

(that's ≡<tab>ME in the first one) will find all system messages (in
NOTICE.TXT[2,2]) from ME.

	FIND president in ∂.nap

will search your News Service notification file on [2,2] for all
notifications containing "president".

	FIND RUN IN IN IN

will search for "RUN" (ignoring case) in a file with the unlikely name
"IN IN".  The ≡ construct can be used to override this, as in

	FIND RUN≡ IN IN IN

which searches for the phrase RUN IN in the file named "IN", since the
quoted space prevents the first "IN" from being taken as a file-name
lead-in.

	DFIND k∞{aeiou}k

will search for all words in the dictionary containing two k's (upper-
or lower-case) with nothing but vowels between them.

	DFIND ¬∃≡a∃∃∞∃{pt}¬{aeiou¬∃}

will search for all words beginning with (due to the initial ¬∃, which
matches only a CR, LF, tab, space, or formfeed) a lower-case `a',
followed by two or more non-delimiting characters, followed by either a
`p' or a `t', and then any non-delimiter other than a vowel.

More example commands:

FIND MCCARTHY
FIND TARGET BYTE in COMLIN.FAI
FIND LOMA VERDE
FIND EVENT IN ∂*
FIND ≡ E≡  IN ∂*
FIND WITHIN MSG PDP-11 IN OUTGO.MSG
FIND WITHIN PAGE UDPUFD IN MONCOM.BH[S,DOC
DFIND weird,wierd
FIND john∞∀lathrop

Example output for the last example command:

*McCarthy, John (Prof. John)  - Professor                [JMC: FR*,DIAL*,VB] 2000 200
*        AI207; 7-4430                                         Sep 4
>        846 Lathrop Dr., Stanford 94305, 321-7580

1 hit on key "john∞∀lathrop".


USING DFIND FROM E

The E editor has an extended command ⊗X DFIND that interfaces with the
FIND program.  (Note that the ⊗X FIND command in E is something else
entirely!  OFIND is not available from E.)  E's DFIND command starts up
a phantom job to do the search, and a summary of the results are sent to
your terminal.  The summary includes the number of hits, as well as the
first, last, and (if different) shortest "hit" lines.  The syntax of the
DFIND command in E is the same as that of the monitor command, with the
same defaults for the WITHIN, OMIT, and IN clauses.  (If you specify a
WITHIN other than the default of WITHIN LINE, only the lines containing
the hit will be reported.  This isn't quite the same as WITHIN LINE,
since a text unit containing several matches yields only one hit.)

If you don't give anything on the DFIND command line in E (i.e., if you
just type ⊗X DFIND<cr>), the current line of text (or the first line of
attached text, if any) will be used to specify the search parameters.


FINAL NOTES

If you type just FIND (or DFIND or OFIND) without providing a <key>,
you'll get a summary of the command syntax and special <key> constructs.

The special sequences currently recognised within the <key> string allow
specification of a subset of regular expressions.  The main loop of the
FIND program could just as easily handle any regular expression without
slowing down any, but I don't feel like writing a parser for the darn
things.  If someone wants to provide a SAIL program that converts
regular expressions into a transition table for a finite state machine,
I'll consider building it into FIND.

Send comments, questions, gripes, etc., via GRIPE FIND.